LOCI: Fast Outlier Detection Using the Local Correlation Integral
نویسندگان
چکیده
Outlier detection is an integral part of data mining and has attracted much attention recently [BKNS00, JTH01, KNT00]. In this paper, we propose a new method for evaluating outlier-ness, which we call the Local Correlation Integral (LOCI). As with the best previous methods, LOCI is highly effective for detecting outliers and groups of outliers (a.k.a. micro-clusters). In addition, it offers the following advantages and novelties: (a) It provides an automatic, data-dictated cut-off to determine whether a point is an outlier—in contrast, previous methods force users to pick cut-offs, without any hints as to what cut-off value is best for a given dataset. (b) It can provide a LOCI plot for each point; this plot summarizes a wealth of information about the data in the vicinity of the point, determining clusters, micro-clusters, their diameters and their inter-cluster distances. None of the existing outlier-detection methods can match this feature, because they output only a single number for each point: its outlierness score. (c) Our LOCI method can be computed as quickly as the best previous methods. (d) Moreover, LOCI leads to a practically linear approximate method, aLOCI (for approximate LOCI), which provides fast highly-accurate outlier detection. To the best of our knowledge, this is the first work to use approximate computations to speed up outlier detection. Experiments on synthetic and real world data sets show that LOCI and aLOCI can automatically detect outliers and micro-clusters, without user-required cut-offs, and that they quickly spot both expected and unexpected outliers.
منابع مشابه
Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience
Outlier detection is an integral part of robust evaluation for crowdsourceable Quality of Experience (QoE) and has attracted much attention in recent years. In QoE for multimedia, outliers happen because of different test conditions, human errors, abnormal variations in context, etc. In this paper, we propose a simple yet effective algorithm for outlier detection and robust QoE evaluation named...
متن کاملAFLP Genome Scanning Reveals Divergent Selection in Natural Populations of Liriodendron chinense (Magnoliaceae) along a Latitudinal Transect
Understanding adaptive genetic variation and its relation to environmental factors are important for understanding how plants adapt to climate change and for managing genetic resources. Genome scans for the loci exhibiting either notably high or low levels of population differentiation (outlier loci) provide one means of identifying genomic regions possibly associated with convergent or diverge...
متن کاملOutlier Detection with Kernel Density Functions
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density ...
متن کاملCross-species outlier detection reveals different evolutionary pressures between sister species
Lodgepole pine (Pinus contorta var. latifolia) and jack pine (Pinus banksiana) hybridize in western Canada, an area of recent mountain pine beetle range expansion. Given the heterogeneity of the environment, and indications of local adaptation, there are many unknowns regarding the response of these forests to future outbreaks. To better understand this we aim to identify genetic regions that h...
متن کاملFast Adaptive Least Trimmed Squares for Robust Evaluation of Quality of Experience
Outlier detection is an integral part of robust evaluation for crowdsourceable Quality of Experience (QoE) and has attracted much attention in recent years. In QoE for multimedia, outliers happen because of different test conditions, human errors, abnormal variations in context, etc. In this paper, we propose a simple yet effective algorithm for outlier detection and robust QoE evaluation named...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003